ARIMAX/SARIMAX/VAR Models
The goal of this project is to explore the relationship among water quality (i.e. sediment, turbidity) and quantity variables (i.e. stream discharge) and climate variables such as precipitation and temperature.
For SARIMAX model, the dependent variables in the SARIMAX model is turbidity, which can be affected by stream discharge (Pejman et al. (2009),Mukundan et al. (2013)) and climate variables (Wang (2014), Roosmalen, Christensen, and Sonnenborg (2007)).
For VAR model, the chosen variables are stream discharge, precipitation, and sediment. Due to the water cycle effect, stream discharge and precipitation have a relationship on both sides. Sediment is usually considered flow with stream flow and caused by high precipitation (Foster and Meyer (1977)). However, due to the water cycle effect and time series factor, there are researches indicate that high turbidity and sediment is an indication of soil erosion (De Vente et al. (2013)) and can cause later higher stream discharge. Thus, for VAR model, the three variables are used to figure out which process is more significant.
Data preparation
In this session, all the data including stream discharge, turbidity, sediment, precipitation, and temperature are processed. The steps include basic data cleaning, such as filling in missing values, trimming all the data to same time frame, and converting to time series format.
Please see the folded code for processing details.
Turbidity (2021-11-24 - 2023-01-28)
Final product is a time series from 2021-11-24 to 2022-12-31 with first 6 elements shown below.
Code
## read data
turb <- read.csv("./data/turbidity.csv", header = F, comment.char = "#")
turb <- turb[3:nrow(turb),1:9]
colnames(turb) <- c("prefix", "station","date","max","flag1","min","flag2","mean","flag3")
turb$date<-as.Date(turb$date,"%m/%d/%y")
## extract daily max
turb_max<- turb[c("date","max")]
turb_max$date<-as.Date(turb_max$date,"%m/%d/%y")
turb_max$max = as.numeric(turb_max$max)
## interpolate the missing data using moving average
library(imputeTS)
turb_max$max <- na_ma(turb_max$max, k = 4, weighting = "exponential")
# min(turb_max$date);max(turb_max$date)
## extract to 12/31/2022 as the climate max date
turb_max = turb_max[turb_max$date <= "2022-12-31", ]
# head(turb_max)
# min(turb_max$date);max(turb_max$date)
## sanity check
# str(turb_max)
# sum(is.na(turb_max))
turb_df <- data.frame(turb_max$date,turb_max$max)
# convert to time series
turb_ts <- read.zoo(turb_df)
head(turb_ts)2021-11-24 2021-11-25 2021-11-26 2021-11-27 2021-11-28 2021-11-29
6.4 6.3 5.9 8.8 5.6 6.0
Sediment (1950-04-12 - 2003-09-29)
Final product is a time series from 1993-9-29 to 2003-09-29 with first 6 elements shown below.
Code
## import and processing data
sed <- read.csv("./data/sediment.csv", header = F, comment.char = "#")
sed <- sed[3:nrow(sed),3:4]
colnames(sed) <- c("date","sed")
sed$date<-as.Date(sed$date,"%m/%d/%y")
sed$date <- as.Date(ifelse(sed$date > Sys.Date(),
format(sed$date, "19%y-%m-%d"),
format(sed$date)))
sed$sed <- as.numeric(sed$sed)
## interpolate the missing data using moving average
library(imputeTS)
sed$sed <- na_ma(sed$sed, k = 4, weighting = "exponential")
# min(sed$date);max(sed$date)
## extract from 1983-9-29 to 2003-09-29, 20-years of data
sed = sed[sed$date >= "1983-9-29", ]
sed_df <- data.frame(sed$date,sed$sed)
# convert to time series
sed_ts <- read.zoo(sed_df)
head(sed_ts)1983-09-29 1983-09-30 1983-10-01 1983-10-02 1983-10-03 1983-10-04
9 10 18 20 24 22
Stream discharge (1940-01-01 - 2023-02-14)
-
For turbidity model, 2021-11-24 to 2022-12-31, the first 6 elements are shown below.
Code
## import and processing data discharge <- read.csv("./data/discharge.csv", header = F, comment.char = "#") discharge <- discharge[3:dim(discharge)[1],] colnames(discharge) <- c("prefix", "station","date","discharge","flag") discharge <- discharge[c("date","discharge")] discharge$date<-as.Date(discharge$date,"%m/%d/%y") discharge$date <- as.Date(ifelse(discharge$date > Sys.Date(), format(discharge$date, "19%y-%m-%d"), format(discharge$date))) # min(discharge$date);max(discharge$date) ### discharge for turbidity model, 2021-11-24 to 2022-12-31 # select data after 2021-11-24 and before 2022-12-31 discharge_turb <- discharge[(discharge$date >= "2021-11-24") & (discharge$date <= "2022-12-31"), ] # change to numeric discharge_turb$discharge <- as.numeric(discharge_turb$discharge) ## interpolate the missing data using moving average library(imputeTS) discharge_turb$discharge <- na_ma(discharge_turb$discharge, k = 4, weighting = "exponential") ## sanity check # str(discharge_turb) # sum(is.na(discharge_turb)) discharge_turb_df <- data.frame(discharge_turb$date,discharge_turb$discharge) # convert to time series discharge_turb_ts <- read.zoo(discharge_turb_df) head(discharge_turb_ts)2021-11-24 2021-11-25 2021-11-26 2021-11-27 2021-11-28 2021-11-29 2470 2480 2280 2260 2510 2250 -
For sediment model, from 1983-9-29 to 2003-09-29 , the first 6 elements are shown below.
Code
### discharge for sediment model, 1983-9-29 to 2003-09-29 # select data after 1983-9-29 and before 2003-09-29 discharge_sed <- discharge[(discharge$date >= "1983-9-29") & (discharge$date <= "2003-09-29"), ] # change to numeric discharge_sed$discharge <- as.numeric(discharge_sed$discharge) ## interpolate the missing data using moving average library(imputeTS) discharge_sed$discharge <- na_ma(discharge_sed$discharge, k = 4, weighting = "exponential") ## sanity check # str(discharge_turb) # sum(is.na(discharge_turb)) discharge_sed_df <- data.frame(discharge_sed$date,discharge_sed$discharge) # convert to time series discharge_sed_ts <- read.zoo(discharge_sed_df) head(discharge_sed_ts)1983-09-29 1983-09-30 1983-10-01 1983-10-02 1983-10-03 1983-10-04 190 220 210 210 210 205
Precipitation and temperature (1940-01-01 - 2022-12-31)
-
For turbidity model, 2021-11-24 to 2022-12-31, the first 6 elements are shown below.
-
Precipitation
Code
## read data and aggregate df <- data.frame(matrix(ncol = 6, nrow = 0)) colnames(df) <- c("station","name","date","prcp","tmax","tmin") for (i in 1:3){ file_name <- paste0("climate",i,".csv") df_curr <- read.csv(paste0("./data/",file_name), header = T) colnames(df_curr) <- c("station","name","date","prcp","tmax","tmin") df <- rbind(df_curr,df) } ## group by date: each day, we have one average prcp, tmax, and tmin based on all stations in Toledo climate <- df %>% group_by(date) %>% summarise_at(vars(prcp,tmax,tmin), mean, na.rm = TRUE) climate$date<-as.Date(climate$date,"%Y-%m-%d") ## sanity check # any(duplicated(climate$date)) # min(climate$date);max(climate$date) ### climate for turbidity model, 2021-11-24 to 2022-12-31 # select data after 2021-11-24 and before 2022-12-31 climate_trub <- climate[(climate$date >= "2021-11-24") & (climate$date <= "2022-12-31"), ] ## interpolate the missing data using moving average climate_trub$prcp <- na_ma(climate_trub$prcp, k = 4, weighting = "exponential") climate_trub$tmax <- na_ma(climate_trub$tmax, k = 4, weighting = "exponential") climate_trub$tmin <- na_ma(climate_trub$tmin, k = 4, weighting = "exponential") ## sanity check #str(climate_trub) #sum(is.na(climate_trub)) #min(climate_trub$date);max(climate_trub$date) ## prcp prcp_trub_df <- data.frame(climate_trub$date,climate_trub$prcp) # convert to time series prcp_trub_ts <- read.zoo(prcp_trub_df) head(prcp_trub_ts)2021-11-24 2021-11-25 2021-11-26 2021-11-27 2021-11-28 2021-11-29 0.002142857 0.133750000 0.068666667 0.009285714 0.078666667 0.001538462 -
Maximum temperature
-
Minimum temperature
-
For sediment model, from 1983-9-29 to 2003-09-29 , the first 6 elements are shown below.
-
Precipitation
Code
### climate for sediment model, 1983-9-29 to 2003-09-29 # select data after 1983-9-29 and before 2003-09-29 climate_sed <- climate[(climate$date >= "1983-9-29") & (climate$date <= "2003-09-29"), ] ## interpolate the missing data using moving average climate_sed$prcp <- na_ma(climate_sed$prcp, k = 4, weighting = "exponential") climate_sed$tmax <- na_ma(climate_sed$tmax, k = 4, weighting = "exponential") climate_sed$tmin <- na_ma(climate_sed$tmin, k = 4, weighting = "exponential") ## sanity check #str(climate_sed) #sum(is.na(climate_sed)) #min(climate_sed$date);max(climate_sed$date) ## prcp prcp_sed_df <- data.frame(climate_sed$date,climate_sed$prcp) # convert to time series prcp_sed_ts <- read.zoo(prcp_sed_df) head(prcp_sed_ts)1983-09-29 1983-09-30 1983-10-01 1983-10-02 1983-10-03 1983-10-04 0.00 0.00 0.00 0.00 0.00 0.01 -
Maximum temperature
-
Minimum temperature
SARIMAX model of turbidity
Plotting the Data
(tab-df?) shows the data sample for turbidity model. Figure 1 shows the time series plots of the variables.
Code
turbidity_model_df <- data.frame(turb_df,discharge_turb_df$discharge_turb.discharge,prcp_trub_df$climate_trub.prcp,tmax_trub_df$climate_trub.tmax,tmin_trub_df$climate_trub.tmin)
colnames(turbidity_model_df)<-c("date","turbidity","discharge","precipitation","tmax","tmin")
knitr::kable(head(turbidity_model_df))| date | turbidity | discharge | precipitation | tmax | tmin |
|---|---|---|---|---|---|
| 2021-11-24 | 6.4 | 2470 | 0.0021429 | 45.33333 | 23.33333 |
| 2021-11-25 | 6.3 | 2480 | 0.1337500 | 50.00000 | 33.00000 |
| 2021-11-26 | 5.9 | 2280 | 0.0686667 | 43.66667 | 27.66667 |
| 2021-11-27 | 8.8 | 2260 | 0.0092857 | 34.00000 | 25.00000 |
| 2021-11-28 | 5.6 | 2510 | 0.0786667 | 39.33333 | 29.66667 |
| 2021-11-29 | 6.0 | 2250 | 0.0015385 | 41.66667 | 23.33333 |
Code
From SARIMA model session for stream discharge, all stream discharge need log transformation. Log transfor for precipitation would introduce NA, thus, the precipitation would stay the original form.
After log transformation, the time series of the variables are shown in Figure 2.
Code
Fitting the model using auto.arima()
The model using auto.arima() is shown below.
Code
Series: lg.turbidity_model.ts[, "turbidity"]
Regression with ARIMA(1,0,1) errors
Coefficients:
ar1 ma1 discharge precipitation tmax tmin
0.8948 -0.2850 0.4246 0.0878 0.0006 0.0022
s.e. 0.0282 0.0587 0.0233 0.0956 0.0037 0.0046
sigma^2 = 0.1824: log likelihood = -226.54
AIC=467.08 AICc=467.36 BIC=495.07
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.01048555 0.4239335 0.269546 -1.380152 8.165318 0.1412965
ACF1
Training set -0.01060533
Ljung-Box test
data: Residuals from Regression with ARIMA(1,0,1) errors
Q* = 59.777, df = 79, p-value = 0.9474
Model df: 2. Total lags used: 81
Figure 3 shows that the fitting model is an ARIMA(1,0,1) model. The ACF plot looks alright, all the errors are less than the threshold.
Fitting the model manually
- Linear regression model
Code
Call:
lm(formula = turbidity ~ precipitation + tmax + tmin + discharge,
data = lg.turbidity_model.ts)
Residuals:
Min 1Q Median 3Q Max
-1.66757 -0.46131 0.02679 0.43976 2.11160
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.2352589 0.2432210 -5.079 5.85e-07 ***
precipitation 0.1325368 0.1759714 0.753 0.452
tmax 0.0054724 0.0051649 1.060 0.290
tmin 0.0005487 0.0059018 0.093 0.926
discharge 0.5599929 0.0252455 22.182 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6838 on 398 degrees of freedom
Multiple R-squared: 0.5603, Adjusted R-squared: 0.5559
F-statistic: 126.8 on 4 and 398 DF, p-value: < 2.2e-16
- Residual fit
Figure 4 shows the ACF and PACF plots for the residuals. The ACF has a fading pattern, which indicates that differencing may be needed.
Code
-
First differencing
Figure 5: Residual diagonosis after linear regression model and differencing Figure 5 shows that p=1,2; q=1.
-
Second differencing with seasonality (annually, Figure 6).
Figure 6: Residual diagonosis after linear regression model and second differencing with annually seasonality - Finding the model parameters
Code
######################## Check for different combinations ######## #write a funtion SARIMA.c=function(p1,p2,q1,q2,d1,d2,P1,P2,Q1,Q2,data){ #K=(p2+1)*(q2+1)*(d2)*(P2+1)*(Q2+1) temp=c() D=1 s=12 i=1 temp= data.frame() ls=matrix(rep(NA,9*48),nrow=48) for (p in p1:p2) { for(q in q1:q2) { for(d in d1:d2){ for(P in P1:P2) { for(Q in Q1:Q2) { if(p+d+q+P+Q+D<=12) { skip_to_next <- FALSE model<- tryCatch({ Arima(data,order=c(p-1,d,q-1),seasonal=c(P-1,D,Q-1)) }, error = function(e) { skip_to_next <<- TRUE}) if(skip_to_next) { next } ls[i,]= c(p-1,d,q-1,P-1,D,Q-1,model$aic,model$bic,model$aicc) i=i+1 #print(i) } } } } } } temp= as.data.frame(ls) names(temp)= c("p","d","q","P","D","Q","AIC","BIC","AICc") temp }All AIC, BIC, and AICc give the best model as ARIMA(1,2,1,0,1,0)[365] for residual.
# q=0,1,; Q=1,2 and PACF plot: p=0,1,2; P=1,2, D=1 and d=0,1,2 output=SARIMA.c(p1=1,p2=3,q1=1,q2=2,d1=1,d2=3,P1=1,P2=3,Q1=1,Q2=3,data=res.fit) #output knitr::kable(output)p d q P D Q AIC BIC AICc 0 1 0 0 1 0 97.25568 98.86660 97.36997 0 2 0 0 1 0 128.65527 130.23879 128.77292 0 3 0 0 1 0 173.77761 175.33296 173.89883 0 1 1 0 1 0 92.99954 96.22138 93.35248 0 2 1 0 1 0 91.22853 94.39557 91.59217 0 3 1 0 1 0 127.30644 130.41713 127.68144 1 1 0 0 1 0 92.06895 95.29078 92.42189 1 2 0 0 1 0 102.87068 106.03772 103.23432 1 3 0 0 1 0 132.89719 136.00788 133.27219 1 1 1 0 1 0 93.60573 98.43849 94.33301 1 2 1 0 1 0 86.63715 91.38771 87.38715 1 3 1 0 1 0 102.32171 106.98775 103.09590 2 1 0 0 1 0 93.87314 98.70589 94.60041 2 2 0 0 1 0 102.36060 107.11116 103.11060 2 3 0 0 1 0 126.33729 131.00334 127.11149 2 1 1 0 1 0 95.58437 102.02804 96.83437 2 2 1 0 1 0 88.34723 94.68131 89.63755 2 3 1 0 1 0 101.89712 108.11852 103.23046 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA p d q P D Q AIC BIC AICc 11 1 2 1 0 1 0 86.63715 91.38771 87.38715p d q P D Q AIC BIC AICc 11 1 2 1 0 1 0 86.63715 91.38771 87.38715p d q P D Q AIC BIC AICc 11 1 2 1 0 1 0 86.63715 91.38771 87.38715However, the turbidity data has only over 400 observations. Here. I show best model is shown below with ARIMA(1,2,1,0,1,0) .
The Ljun-Box statistic and Q-Q plot look ok, but ACF plot could be better. This indicate the analysis need more data to improve the model.
Cross validation
n=length(res.fit)
k=200
#n-k=203
err1 = c()
err2 = c()
for(i in 1:(n-k))
{
xtrain <- res.fit[1:(k-1)+i] #observations from 1 to 350
xtest <- res.fit[k+i] #351th observation as the test set
fit <- Arima(xtrain, order=c(1,0,1),
include.drift=FALSE, method="ML")
fcast1 <- forecast(fit, h=4)
fit2 <- Arima(xtrain, order=c(1,2,1), seasonal=c(0,1,0),
include.drift=FALSE, method="ML")
fcast2 <- forecast(fit2, h=4)
#capture error for each iteration
# This is mean absolute error
err1 = c(err1, abs(fcast1$mean-xtest))
err2 = c(err2, abs(fcast2$mean-xtest))
# This is mean squared error
err3 = c(err1, (fcast1$mean-xtest)^2)
err4 = c(err2, (fcast2$mean-xtest)^2)
}
(MAE1=mean(err1)) # This is mean absolute error[1] 0.2824119
[1] 0.304012
[1] 0.4392394
[1] 0.4838416
Both MAE and RMSE suggest the manually fitted model, ARIMA(1,2,1,0,1,0) is better than the auto.arima() model, ARIMA(1,0,1).
Best model
fit <- Arima(lg.turbidity_model.ts[, "turbidity"], order=c(1,2,1),seasonal = c(0,1,0), xreg = xreg)
summary(fit)Series: lg.turbidity_model.ts[, "turbidity"]
Regression with ARIMA(1,2,1)(0,1,0)[365] errors
Coefficients:
ar1 ma1 discharge precipitation tmax tmin
-0.4437 -0.9995 0.5288 0.4683 -0.0003 -0.0088
s.e. 0.1576 0.0387 0.2065 0.3360 0.0124 0.0160
sigma^2 = 0.6045: log likelihood = -39.19
AIC=92.38 AICc=96.38 BIC=103.46
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -0.001783241 0.2121265 0.04570337 -0.3000952 1.606162 0.02395778
ACF1
Training set 0.008323962
\[ (1 + 0.4437B^1 + B^2) * (1 - B^{365}) * discharge(t) = 0.5288 * discharge(t) + 0.4683 * precipitation(t) - 0.0003 * tmax(t) - 0.0088 * tmin(t) - 0.9995\omega(t-1) * \omega(t) \]
Forcasting
- Turbidity
Code
Series: lg.turbidity_model[, "discharge"]
ARIMA(1,1,3)
Coefficients:
ar1 ma1 ma2 ma3
0.7870 -0.4365 -0.3785 -0.1139
s.e. 0.0576 0.0730 0.0527 0.0600
sigma^2 = 0.115: log likelihood = -134.08
AIC=278.16 AICc=278.31 BIC=298.15
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -0.01051251 0.3370327 0.237908 -0.3106121 3.198979 0.1109779
ACF1
Training set 3.796219e-05
- Precipitation
Code
Series: lg.turbidity_model[, "precipitation"]
ARIMA(0,0,1) with non-zero mean
Coefficients:
ma1 mean
0.1524 0.0924
s.e. 0.0510 0.0115
sigma^2 = 0.04005: log likelihood = 77.5
AIC=-149 AICc=-148.94 BIC=-137.01
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 4.109928e-05 0.1996324 0.1246119 -Inf Inf 0.7434769 -0.00235585
- Tmax
Code
Series: lg.turbidity_model[, "tmax"]
ARIMA(2,1,2)
Coefficients:
ar1 ar2 ma1 ma2
0.6389 -0.2293 -0.5212 -0.2855
s.e. 0.1075 0.0963 0.1069 0.1019
sigma^2 = 37.76: log likelihood = -1298.71
AIC=2607.42 AICc=2607.57 BIC=2627.4
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -0.0507898 6.106834 4.42749 -3.117993 11.34176 0.3541992
ACF1
Training set 0.0003491482
- Tmin
Code
Series: lg.turbidity_model[, "tmin"]
ARIMA(2,1,2)
Coefficients:
ar1 ar2 ma1 ma2
0.5618 -0.1251 -0.5784 -0.2393
s.e. 0.1661 0.1353 0.1648 0.1541
sigma^2 = 32.33: log likelihood = -1267.38
AIC=2544.76 AICc=2544.91 BIC=2564.74
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.005936972 5.650462 4.257408 5.316034 27.88954 0.4737379
ACF1
Training set -0.001209705
4 Forecast
Conclusion
The manually fitted model shows that the turbidity will be decrease in the near future based on stream discharge, precipitation, tmax, tmin. This agrees with the stream discharge annual pattern: lower in fall and winter, higher in spring due to snow melting and higher precipitation. However, the annual pattern may not be captured well, because that the sample size is small. Thus, this model need to be verified with larger dataset in the future.
VAR model
The variables include precipitation, stream discharge, precipitation. Because I want to explore the longer lag relationship among them, I will use monthly data here.
Code
sed_model_df <- data.frame(sed_df,discharge_sed$discharge, prcp_sed_ts)
colnames(sed_model_df)<-c("date","sediment","discharge","precipitation")
### can get monthly data
# Get mean value for each month
# take log of the discharge
sed_model_df["discharge"] <- log(sed_model_df["discharge"])
mean_data <- sed_model_df %>%
mutate(month = month(sed_model_df$date), year = year(sed_model_df$date)) %>%
group_by(year, month) %>%
summarise_at(vars(c("sediment","discharge","precipitation")), list(mean = mean))
var_month<-ts(mean_data[3:5],star=decimal_date(as.Date("1983-09-29",format = "%Y-%m-%d")),frequency = 12)
autoplot(var_month[,c(1:3)], facets=TRUE) +
xlab("Time") + ylab("") +
ggtitle("Monthly variables")$selection
AIC(n) HQ(n) SC(n) FPE(n)
8 3 1 8
$criteria
1 2 3 4 5 6 7
AIC(n) 0.7481091 0.7152077 0.5944947 0.5450676 0.5391289 0.4443612 0.4271302
HQ(n) 0.8388459 0.8603866 0.7941157 0.7991307 0.8476341 0.8073084 0.8445196
SC(n) 0.9730254 1.0750738 1.0893105 1.1748332 1.3038443 1.3440263 1.4617452
FPE(n) 2.1130447 2.0447859 1.8125173 1.7255152 1.7159209 1.5615824 1.5359804
8 9 10 11 12
AIC(n) 0.4008526 0.4433351 0.472206 0.4535587 0.4633184
HQ(n) 0.8726840 0.9696087 1.052922 1.0887164 1.1529182
SC(n) 1.5704173 1.7478496 1.911670 2.0279727 2.1726822
FPE(n) 1.4975083 1.5642969 1.612407 1.5853433 1.6041771
The results show p=1, 3 and 8 are with good SC, HQ, or AIC/FPE.
VAR(1)
VAR Estimation Results:
=========================
Endogenous variables: sediment_mean, discharge_mean, precipitation_mean
Deterministic variables: both
Sample size: 240
Log Likelihood: -1105.261
Roots of the characteristic polynomial:
0.642 0.2904 0.1438
Call:
vars::VAR(y = var_month, p = 1, type = "both")
Estimation results for equation sediment_mean:
==============================================
sediment_mean = sediment_mean.l1 + discharge_mean.l1 + precipitation_mean.l1 + const + trend
Estimate Std. Error t value Pr(>|t|)
sediment_mean.l1 0.21918 0.08914 2.459 0.0147 *
discharge_mean.l1 15.81294 4.85883 3.254 0.0013 **
precipitation_mean.l1 -184.89650 85.00729 -2.175 0.0306 *
const -61.10324 35.63190 -1.715 0.0877 .
trend 0.10024 0.05492 1.825 0.0692 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 57.1 on 235 degrees of freedom
Multiple R-Squared: 0.1888, Adjusted R-squared: 0.175
F-statistic: 13.68 on 4 and 235 DF, p-value: 4.852e-10
Estimation results for equation discharge_mean:
===============================================
discharge_mean = sediment_mean.l1 + discharge_mean.l1 + precipitation_mean.l1 + const + trend
Estimate Std. Error t value Pr(>|t|)
sediment_mean.l1 -0.0034489 0.0013087 -2.635 0.00896 **
discharge_mean.l1 0.7729023 0.0713339 10.835 < 2e-16 ***
precipitation_mean.l1 -2.0939191 1.2480161 -1.678 0.09472 .
const 2.1936310 0.5231221 4.193 3.9e-05 ***
trend 0.0001387 0.0008063 0.172 0.86354
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8383 on 235 degrees of freedom
Multiple R-Squared: 0.4004, Adjusted R-squared: 0.3902
F-statistic: 39.24 on 4 and 235 DF, p-value: < 2.2e-16
Estimation results for equation precipitation_mean:
===================================================
precipitation_mean = sediment_mean.l1 + discharge_mean.l1 + precipitation_mean.l1 + const + trend
Estimate Std. Error t value Pr(>|t|)
sediment_mean.l1 8.486e-05 7.686e-05 1.104 0.2707
discharge_mean.l1 -1.225e-03 4.190e-03 -0.292 0.7703
precipitation_mean.l1 8.411e-02 7.330e-02 1.148 0.2523
const 9.280e-02 3.072e-02 3.021 0.0028 **
trend -3.091e-05 4.736e-05 -0.653 0.5145
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04923 on 235 degrees of freedom
Multiple R-Squared: 0.02277, Adjusted R-squared: 0.006138
F-statistic: 1.369 on 4 and 235 DF, p-value: 0.2454
Covariance matrix of residuals:
sediment_mean discharge_mean precipitation_mean
sediment_mean 3260.098 33.46433 1.344236
discharge_mean 33.464 0.70268 0.017749
precipitation_mean 1.344 0.01775 0.002424
Correlation matrix of residuals:
sediment_mean discharge_mean precipitation_mean
sediment_mean 1.0000 0.6992 0.4782
discharge_mean 0.6992 1.0000 0.4301
precipitation_mean 0.4782 0.4301 1.0000
VAR(3)
VAR Estimation Results:
=========================
Endogenous variables: sediment_mean, discharge_mean, precipitation_mean
Deterministic variables: both
Sample size: 238
Log Likelihood: -1058.944
Roots of the characteristic polynomial:
0.7776 0.7776 0.725 0.591 0.4909 0.4909 0.4119 0.4119 0.272
Call:
vars::VAR(y = var_month, p = 3, type = "both")
Estimation results for equation sediment_mean:
==============================================
sediment_mean = sediment_mean.l1 + discharge_mean.l1 + precipitation_mean.l1 + sediment_mean.l2 + discharge_mean.l2 + precipitation_mean.l2 + sediment_mean.l3 + discharge_mean.l3 + precipitation_mean.l3 + const + trend
Estimate Std. Error t value Pr(>|t|)
sediment_mean.l1 0.09740 0.09174 1.062 0.28952
discharge_mean.l1 15.23421 5.99443 2.541 0.01171 *
precipitation_mean.l1 -124.09096 84.52566 -1.468 0.14347
sediment_mean.l2 0.16936 0.09667 1.752 0.08112 .
discharge_mean.l2 8.59342 7.28602 1.179 0.23946
precipitation_mean.l2 -243.88524 85.65124 -2.847 0.00481 **
sediment_mean.l3 0.21419 0.09470 2.262 0.02466 *
discharge_mean.l3 -12.73707 6.13465 -2.076 0.03900 *
precipitation_mean.l3 -246.08273 84.46078 -2.914 0.00393 **
const -0.11040 42.65437 -0.003 0.99794
trend 0.06540 0.05453 1.199 0.23160
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 54.01 on 227 degrees of freedom
Multiple R-Squared: 0.2885, Adjusted R-squared: 0.2572
F-statistic: 9.206 on 10 and 227 DF, p-value: 9.451e-13
Estimation results for equation discharge_mean:
===============================================
discharge_mean = sediment_mean.l1 + discharge_mean.l1 + precipitation_mean.l1 + sediment_mean.l2 + discharge_mean.l2 + precipitation_mean.l2 + sediment_mean.l3 + discharge_mean.l3 + precipitation_mean.l3 + const + trend
Estimate Std. Error t value Pr(>|t|)
sediment_mean.l1 -4.097e-03 1.360e-03 -3.012 0.002893 **
discharge_mean.l1 7.648e-01 8.889e-02 8.604 1.28e-15 ***
precipitation_mean.l1 -1.069e+00 1.253e+00 -0.853 0.394548
sediment_mean.l2 1.430e-03 1.433e-03 0.997 0.319616
discharge_mean.l2 2.069e-01 1.080e-01 1.915 0.056755 .
precipitation_mean.l2 -4.415e+00 1.270e+00 -3.476 0.000609 ***
sediment_mean.l3 7.368e-04 1.404e-03 0.525 0.600335
discharge_mean.l3 -2.912e-01 9.097e-02 -3.201 0.001565 **
precipitation_mean.l3 -2.050e-01 1.252e+00 -0.164 0.870161
const 3.141e+00 6.325e-01 4.965 1.35e-06 ***
trend -1.043e-05 8.086e-04 -0.013 0.989718
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8009 on 227 degrees of freedom
Multiple R-Squared: 0.4676, Adjusted R-squared: 0.4441
F-statistic: 19.94 on 10 and 227 DF, p-value: < 2.2e-16
Estimation results for equation precipitation_mean:
===================================================
precipitation_mean = sediment_mean.l1 + discharge_mean.l1 + precipitation_mean.l1 + sediment_mean.l2 + discharge_mean.l2 + precipitation_mean.l2 + sediment_mean.l3 + discharge_mean.l3 + precipitation_mean.l3 + const + trend
Estimate Std. Error t value Pr(>|t|)
sediment_mean.l1 8.239e-05 8.281e-05 0.995 0.321
discharge_mean.l1 -7.274e-03 5.411e-03 -1.344 0.180
precipitation_mean.l1 1.145e-01 7.630e-02 1.501 0.135
sediment_mean.l2 -5.021e-05 8.726e-05 -0.575 0.566
discharge_mean.l2 8.018e-03 6.577e-03 1.219 0.224
precipitation_mean.l2 -2.788e-02 7.731e-02 -0.361 0.719
sediment_mean.l3 1.935e-05 8.548e-05 0.226 0.821
discharge_mean.l3 5.376e-03 5.537e-03 0.971 0.333
precipitation_mean.l3 -9.352e-02 7.624e-02 -1.227 0.221
const 4.405e-02 3.850e-02 1.144 0.254
trend -1.272e-05 4.922e-05 -0.259 0.796
Residual standard error: 0.04875 on 227 degrees of freedom
Multiple R-Squared: 0.05668, Adjusted R-squared: 0.01513
F-statistic: 1.364 on 10 and 227 DF, p-value: 0.1982
Covariance matrix of residuals:
sediment_mean discharge_mean precipitation_mean
sediment_mean 2916.663 30.02468 1.271526
discharge_mean 30.025 0.64136 0.018059
precipitation_mean 1.272 0.01806 0.002376
Correlation matrix of residuals:
sediment_mean discharge_mean precipitation_mean
sediment_mean 1.0000 0.6942 0.4830
discharge_mean 0.6942 1.0000 0.4626
precipitation_mean 0.4830 0.4626 1.0000
VAR(8)
VAR Estimation Results:
=========================
Endogenous variables: sediment_mean, discharge_mean, precipitation_mean
Deterministic variables: both
Sample size: 233
Log Likelihood: -956.712
Roots of the characteristic polynomial:
0.979 0.979 0.8923 0.8331 0.8331 0.7842 0.7541 0.7541 0.7476 0.7476 0.7376 0.7376 0.7238 0.7109 0.7109 0.6854 0.6854 0.6632 0.6632 0.5744 0.5744 0.3774 0.3774 0.2044
Call:
vars::VAR(y = var_month, p = 8, type = "both")
Estimation results for equation sediment_mean:
==============================================
sediment_mean = sediment_mean.l1 + discharge_mean.l1 + precipitation_mean.l1 + sediment_mean.l2 + discharge_mean.l2 + precipitation_mean.l2 + sediment_mean.l3 + discharge_mean.l3 + precipitation_mean.l3 + sediment_mean.l4 + discharge_mean.l4 + precipitation_mean.l4 + sediment_mean.l5 + discharge_mean.l5 + precipitation_mean.l5 + sediment_mean.l6 + discharge_mean.l6 + precipitation_mean.l6 + sediment_mean.l7 + discharge_mean.l7 + precipitation_mean.l7 + sediment_mean.l8 + discharge_mean.l8 + precipitation_mean.l8 + const + trend
Estimate Std. Error t value Pr(>|t|)
sediment_mean.l1 4.184e-02 9.738e-02 0.430 0.66792
discharge_mean.l1 1.009e+01 7.470e+00 1.350 0.17837
precipitation_mean.l1 -2.040e+01 9.835e+01 -0.207 0.83592
sediment_mean.l2 1.477e-01 1.012e-01 1.460 0.14582
discharge_mean.l2 6.393e+00 7.970e+00 0.802 0.42342
precipitation_mean.l2 -1.228e+02 9.645e+01 -1.273 0.20445
sediment_mean.l3 1.086e-01 1.020e-01 1.065 0.28826
discharge_mean.l3 -3.570e-02 7.898e+00 -0.005 0.99640
precipitation_mean.l3 -2.420e+02 9.181e+01 -2.636 0.00902 **
sediment_mean.l4 1.238e-01 1.012e-01 1.223 0.22276
discharge_mean.l4 -4.606e+00 7.621e+00 -0.604 0.54621
precipitation_mean.l4 -9.518e+01 9.199e+01 -1.035 0.30206
sediment_mean.l5 2.048e-01 1.006e-01 2.035 0.04308 *
discharge_mean.l5 -9.153e+00 7.511e+00 -1.219 0.22439
precipitation_mean.l5 -5.019e+01 9.248e+01 -0.543 0.58792
sediment_mean.l6 6.143e-03 1.005e-01 0.061 0.95130
discharge_mean.l6 -4.396e+00 7.712e+00 -0.570 0.56933
precipitation_mean.l6 1.003e+02 9.256e+01 1.083 0.28002
sediment_mean.l7 -7.048e-03 9.992e-02 -0.071 0.94383
discharge_mean.l7 -2.602e+00 7.752e+00 -0.336 0.73752
precipitation_mean.l7 7.137e+01 1.007e+02 0.709 0.47937
sediment_mean.l8 2.015e-02 9.765e-02 0.206 0.83675
discharge_mean.l8 2.057e+00 6.766e+00 0.304 0.76140
precipitation_mean.l8 1.041e+02 9.944e+01 1.047 0.29615
const 5.646e+01 7.650e+01 0.738 0.46134
trend 7.107e-02 5.775e-02 1.231 0.21984
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 52.91 on 207 degrees of freedom
Multiple R-Squared: 0.3283, Adjusted R-squared: 0.2471
F-statistic: 4.046 on 25 and 207 DF, p-value: 9.672e-09
Estimation results for equation discharge_mean:
===============================================
discharge_mean = sediment_mean.l1 + discharge_mean.l1 + precipitation_mean.l1 + sediment_mean.l2 + discharge_mean.l2 + precipitation_mean.l2 + sediment_mean.l3 + discharge_mean.l3 + precipitation_mean.l3 + sediment_mean.l4 + discharge_mean.l4 + precipitation_mean.l4 + sediment_mean.l5 + discharge_mean.l5 + precipitation_mean.l5 + sediment_mean.l6 + discharge_mean.l6 + precipitation_mean.l6 + sediment_mean.l7 + discharge_mean.l7 + precipitation_mean.l7 + sediment_mean.l8 + discharge_mean.l8 + precipitation_mean.l8 + const + trend
Estimate Std. Error t value Pr(>|t|)
sediment_mean.l1 -2.995e-03 1.381e-03 -2.169 0.031243 *
discharge_mean.l1 4.720e-01 1.059e-01 4.456 1.37e-05 ***
precipitation_mean.l1 2.394e+00 1.395e+00 1.717 0.087553 .
sediment_mean.l2 1.122e-03 1.435e-03 0.782 0.435001
discharge_mean.l2 2.122e-01 1.130e-01 1.877 0.061908 .
precipitation_mean.l2 -1.670e+00 1.368e+00 -1.221 0.223352
sediment_mean.l3 -1.096e-05 1.446e-03 -0.008 0.993960
discharge_mean.l3 -1.233e-02 1.120e-01 -0.110 0.912478
precipitation_mean.l3 -1.153e+00 1.302e+00 -0.886 0.376746
sediment_mean.l4 5.921e-04 1.435e-03 0.413 0.680377
discharge_mean.l4 -4.996e-02 1.081e-01 -0.462 0.644335
precipitation_mean.l4 4.998e-01 1.305e+00 0.383 0.702030
sediment_mean.l5 2.281e-03 1.427e-03 1.598 0.111497
discharge_mean.l5 -2.240e-01 1.065e-01 -2.103 0.036651 *
precipitation_mean.l5 4.568e-01 1.311e+00 0.348 0.727974
sediment_mean.l6 -3.587e-04 1.425e-03 -0.252 0.801439
discharge_mean.l6 -1.939e-01 1.094e-01 -1.773 0.077762 .
precipitation_mean.l6 4.613e+00 1.313e+00 3.514 0.000542 ***
sediment_mean.l7 -1.337e-04 1.417e-03 -0.094 0.924922
discharge_mean.l7 -7.190e-02 1.099e-01 -0.654 0.513830
precipitation_mean.l7 3.049e+00 1.428e+00 2.134 0.033992 *
sediment_mean.l8 -1.271e-04 1.385e-03 -0.092 0.926959
discharge_mean.l8 1.260e-01 9.594e-02 1.313 0.190688
precipitation_mean.l8 2.004e+00 1.410e+00 1.421 0.156716
const 4.751e+00 1.085e+00 4.380 1.89e-05 ***
trend 1.367e-04 8.189e-04 0.167 0.867569
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.7503 on 207 degrees of freedom
Multiple R-Squared: 0.5602, Adjusted R-squared: 0.507
F-statistic: 10.55 on 25 and 207 DF, p-value: < 2.2e-16
Estimation results for equation precipitation_mean:
===================================================
precipitation_mean = sediment_mean.l1 + discharge_mean.l1 + precipitation_mean.l1 + sediment_mean.l2 + discharge_mean.l2 + precipitation_mean.l2 + sediment_mean.l3 + discharge_mean.l3 + precipitation_mean.l3 + sediment_mean.l4 + discharge_mean.l4 + precipitation_mean.l4 + sediment_mean.l5 + discharge_mean.l5 + precipitation_mean.l5 + sediment_mean.l6 + discharge_mean.l6 + precipitation_mean.l6 + sediment_mean.l7 + discharge_mean.l7 + precipitation_mean.l7 + sediment_mean.l8 + discharge_mean.l8 + precipitation_mean.l8 + const + trend
Estimate Std. Error t value Pr(>|t|)
sediment_mean.l1 3.528e-05 8.940e-05 0.395 0.694
discharge_mean.l1 3.381e-03 6.858e-03 0.493 0.623
precipitation_mean.l1 -1.801e-02 9.029e-02 -0.199 0.842
sediment_mean.l2 4.444e-08 9.287e-05 0.000 1.000
discharge_mean.l2 4.697e-03 7.317e-03 0.642 0.522
precipitation_mean.l2 -9.101e-02 8.855e-02 -1.028 0.305
sediment_mean.l3 7.905e-06 9.363e-05 0.084 0.933
discharge_mean.l3 -1.600e-03 7.251e-03 -0.221 0.826
precipitation_mean.l3 -3.523e-02 8.428e-02 -0.418 0.676
sediment_mean.l4 2.425e-05 9.292e-05 0.261 0.794
discharge_mean.l4 4.559e-04 6.996e-03 0.065 0.948
precipitation_mean.l4 -9.918e-02 8.446e-02 -1.174 0.242
sediment_mean.l5 -7.392e-06 9.238e-05 -0.080 0.936
discharge_mean.l5 6.098e-03 6.896e-03 0.884 0.378
precipitation_mean.l5 -6.192e-02 8.490e-02 -0.729 0.467
sediment_mean.l6 -8.211e-05 9.222e-05 -0.890 0.374
discharge_mean.l6 9.482e-03 7.080e-03 1.339 0.182
precipitation_mean.l6 -1.814e-01 8.498e-02 -2.134 0.034 *
sediment_mean.l7 -2.095e-06 9.174e-05 -0.023 0.982
discharge_mean.l7 -1.961e-03 7.117e-03 -0.276 0.783
precipitation_mean.l7 -1.092e-01 9.247e-02 -1.180 0.239
sediment_mean.l8 -3.165e-05 8.965e-05 -0.353 0.724
discharge_mean.l8 1.574e-03 6.211e-03 0.253 0.800
precipitation_mean.l8 -5.069e-02 9.129e-02 -0.555 0.579
const -1.248e-02 7.023e-02 -0.178 0.859
trend -1.588e-05 5.301e-05 -0.299 0.765
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04858 on 207 degrees of freedom
Multiple R-Squared: 0.1268, Adjusted R-squared: 0.02139
F-statistic: 1.203 on 25 and 207 DF, p-value: 0.2395
Covariance matrix of residuals:
sediment_mean discharge_mean precipitation_mean
sediment_mean 2799.546 28.28649 1.38484
discharge_mean 28.286 0.56299 0.02334
precipitation_mean 1.385 0.02334 0.00236
Correlation matrix of residuals:
sediment_mean discharge_mean precipitation_mean
sediment_mean 1.0000 0.7125 0.5388
discharge_mean 0.7125 1.0000 0.6403
precipitation_mean 0.5388 0.6403 1.0000
For VAR(1), VAR(3) and VAR(8) models, when using sediment as dependent variable, both discharge and precipitation are significant. Discharge also related to sediment adn precipitation significantly. However, precipitation is not affected by either sediment or discharge. I have also tried using 10 years data and get VAR(7) model with good AIC. On the contrary, VAR(7) model shows precipitation are correlated with lag 7 of discharge and sediment. This may imply the periodicity of water cycle. And also the length of data various may give different model.
Cross validation
Code
var_month1 <- var_month[2:241,]
n=nrow(var_month)
k=76 #19*4
#n-k=164; 164/12=13.67;
rmse1 <- matrix(NA, 12*13,3)
rmse2 <- matrix(NA, 12*13,3)
rmse3 <- matrix(NA, 12*13,3)
year<-c()
# Convert data frame to time series object
ts_obj <- ts(var_month1[, c(1:3)], star=decimal_date(as.Date("1983-09-29",format = "%Y-%m-%d")),frequency = 12)
st <- tsp(ts_obj )[1]+(k-1)/12
for(i in 1:13) # i is a year
{
xtrain <- window(ts_obj, end=st + i-1)
xtest <- window(ts_obj, start=st + (i-1) + 1/12, end=st + i)
## p=1
fit <- vars::VAR(ts_obj, p=1, type='both')
fcast <- predict(fit, n.ahead = 12)
fsed<-fcast$fcst$sediment_mean
fprcp<-fcast$fcst$precipitation_mean
fdisc<-fcast$fcst$discharge_mean
ff<-data.frame(fsed[,1],fprcp[,1],fdisc[,1])
year<-st + (i-1) + 1/12
ff<-ts(ff,start=c(year,1),frequency = 12)
a = (i-1)*12+1
b= a+11
rmse1[c(a:b),] <- sqrt((ff-xtest)^2)
## p=3
fit2 <- vars::VAR(ts_obj, p=3, type='both')
fcast2 <- predict(fit2, n.ahead = 12)
fsed<-fcast2$fcst$sediment_mean
fprcp<-fcast2$fcst$precipitation_mean
fdisc<-fcast2$fcst$discharge_mean
ff2<-data.frame(fsed[,1],fprcp[,1],fdisc[,1])
year<-st + (i-1) + 1/12
ff2<-ts(ff2,start=c(year,1),frequency = 12)
rmse2[c(a:b),] <-sqrt((ff2-xtest)^2)
## p=8
fit3 <- vars::VAR(ts_obj, p=8, type='both')
fcast3 <- predict(fit3, n.ahead = 12)
fsed<-fcast3$fcst$sediment_mean
fprcp<-fcast3$fcst$precipitation_mean
fdisc<-fcast3$fcst$discharge_mean
ff3<-data.frame(fsed[,1],fprcp[,1],fdisc[,1])
year<-st + (i-1) + 1/12
ff3<-ts(ff3,start=c(year,1),frequency = 12)
rmse3[c(a:b),] <-sqrt((ff3-xtest)^2)
}
yr = rep(c(1983:1995),each =12)
mn = rep(c("September", "October", "November", "December","January","February","March","April","May","June","July", "August"),13)
rmse1 = data.frame(yr,mn,rmse1)
names(rmse1) =c("Year", "Month","Sediment","Discharge","Precipitation")
rmse2 = data.frame(yr,mn,rmse2)
names(rmse2) =c("Year", "Month","Sediment","Discharge","Precipitation")
rmse3 = data.frame(yr,mn,rmse3)
names(rmse3) =c("Year", "Month","Sediment","Discharge","Precipitation")
### sediment
ggplot() +
geom_line(data = rmse1, aes(x = Year, y = Sediment,colour="VAR(1)")) +
geom_line(data = rmse2, aes(x = Year, y = Sediment,colour="VAR(3)")) +
geom_line(data = rmse3, aes(x = Year, y = Sediment,colour="VAR(8)")) +
labs(
title = "CV RMSE for Sediment",
x = "Date",
y = "RMSE") +
scale_color_manual(name = "VAR Models", values = c("VAR(1)" = "blue", "VAR(3)" = "red","VAR(8)" = "black"))Code
## discharge
ggplot() +
geom_line(data = rmse1, aes(x = Year, y = Discharge,colour="VAR(1)")) +
geom_line(data = rmse2, aes(x = Year, y = Discharge,colour="VAR(3)")) +
geom_line(data = rmse3, aes(x = Year, y = Discharge,colour="VAR(8)")) +
labs(
title = "CV RMSE for Discharge",
x = "Date",
y = "RMSE") +
scale_color_manual(name = "VAR Models", values = c("VAR(1)" = "blue", "VAR(3)" = "red","VAR(8)" = "black"))Code
## discharge
ggplot() +
geom_line(data = rmse1, aes(x = Year, y = Precipitation,colour="VAR(1)")) +
geom_line(data = rmse2, aes(x = Year, y = Precipitation,colour="VAR(3)")) +
geom_line(data = rmse3, aes(x = Year, y = Precipitation,colour="VAR(8)")) +
labs(
title = "CV RMSE for Precipitation",
x = "Date",
y = "RMSE") +
scale_color_manual(name = "VAR Models", values = c("VAR(1)" = "blue", "VAR(3)" = "red","VAR(8)" = "black"))Code
Portmanteau Test (asymptotic)
data: Residuals of VAR object var3
Chi-squared = 76.221, df = 63, p-value = 0.1225
From Figure 7 ,Figure 8, and Figure 9, we can see the value of p has little impact on stream discharge. For precipitation, VAR(8) has lower RMSE compared to VAR(1) and VAR(3). For sediment, the results varied by time. VAR(1) perform better with lower RMSE around 1982-1985m 1990, 1992, 1995, while VAR(1) and VAR(8) both have lower RMSE in some years. The residuals for VAR(3) model pass the test for serial correlation, while others failed. ACF plots for Figure 10 looks ok. Thus, I choose VAR(3) for the overall model.
Forecast
Conclusion
The VAR models show that the long term correlation caused by water cycle is not significant or cannot be detected. The precipitation is not affected by discharge or sediment. The discharge is not affected by sediment. The strong relationships are still one way. Thus, SARIMAX model may be more suitble for these variables with sediment as dependent variables, and precipitation and discharge plus temperature as the independent variables.